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Abstract: "Spatial statistics" is an academic field that deals with the statistical analysis of spatial 
data, and has been applied to econometrics and various other policy fields. These methods are 
easily applied by hematologists and oncologists using better and much less expensive software. 
To encourage physicians to use these methods, this review introduces the methods and demon- 
strates the analyses using R and FleXScan, which can be freely downloaded from the website, 
with sample data. It is demonstrated that spatial analysis can be used by physicians to analyze 
hematological diseases. In addition, applying the technique presented to the investigation of 
patient prognoses may enable generation of data that are also useful for solving health policy- 
related problems, such as the optimal distribution of medical resources. 
Keywords: leukemia, malignant lymphoma, Tango's index, spatial regression model 

Introduction 

"Spatial statistics" is an academic field that deals with the statistical analysis of spatial 
data. In the field of epidemiology, Snow created a cholera map in the 19th century with 
the goal of extracting the spatial unevenness in the distribution of cholera patients in an 
outbreak in London, and he used it as the basis for establishing measures for preventing 
cholera. This is a formulation of what today is called spatial clustering, and its modern 
applications have been developed as spatial epidemiology, directed toward analyzing risk 
assessment for infectious diseases and various other diseases. 1 "Spatial statistics" has also 
been applied to many fields, including econometrics and various other policy fields. 2 

Implementing spatial statistics requires a statistics package for the use of special sta- 
tistical techniques, but in recent years, R Software 3 and FleXScan software, 4 which are 
statistical packages for spatial statistics, have become available free of charge to all. It is 
also essential to use a graphic information system (GIS). A GIS is a construct for linking 
text, numbers, images, or the like to a map, creating a reproduction on a computer, and 
integrating, analyzing, or making an easy to understand map representation of various 
forms of information from locations and positions; it has been widely used in the fields of 
disaster management and in business settings. To use a GIS, there is not only commercial 
software, such as ArcGIS (ESRI; Redlands, CA, USA), but also free software, such as the 
Quantum GIS (QGIS Development Team; Quantum GIS Geographic Information System. 
Open Source Geospatial Foundation Project, http://qgis.osgeo.org ). 5 and environments have 
been set up for clinicians to allow them to conduct spatial epidemiological research. 

Regional clustering can help elucidate the etiology of hematological and oncological 
diseases, such as adult T-cell leukemia. 6 The study of regional clustering is expected 
to lead to the identification of risk factors and a better understanding of the pathology 
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of these diseases. Since the uneven distribution of diseases 
is thought to be dependent also on the availability of medi- 
cal services aimed at the proper diagnosis of hematological 
diseases, spatial analysis of hematological diseases would 
also be useful in the field of health policy 7 8 

Yamagata Prefecture, which is located about 300 km 
north of Tokyo with a population of about 1 .2 million, boasts 
a regional cancer registry of the highest precision in Japan, 
and it is one of the few prefectures where the incidence of 
cancer can be comprehensively understood. Therefore, this 
information was used to implement spatial analysis of hema- 
tological diseases with a spatial statistics package as a guide 
to hematologists and oncologists. To encourage physicians 
to use these methods, this review introduces the methods 
and demonstrates the analyses using R and FleXScan with 
sample data. 

Software used for statistical analysis 

R version 2.14.2 (R Foundation for Statistical Computing, 
Vienna, Austria) and the packages "spdep", "Dcluster", and 
"classlnt" were used. R can be downloaded from the website. 3 
FleXScan software version 3.1 (FleXScan; National Institute 
of Public Health, Tokyo, Japan) was used to conduct global 
clustering tests using Tango's index. 9 The users' guide can 
also be downloaded from the website. 4 

For regression analysis in an econometric model, 7 the 
incidences of diseases in each municipality and the number 



of hospitals that employ full-time hematologists were shown. 
These data were collected from interviews with hematology 
physicians and from the hospitals' websites. 

The age-adjusted disease incidence was calculated using 
the 1985 model population of Japan 10 and the 2008 model 
population of Yamagata Prefecture. 11 The detailed method of 
spatial analysis using R has been described elsewhere. 7 8 

Data used for analysis 

The data related to hematological malignant diseases includ- 
ing malignant lymphoma, leukemia, and multiple myeloma 
between 2000 and 2008 were provided by the cancer registry 
ofYamagata Prefecture. The data included type of disease, date 
of onset of disease, age, sex, and the cities where the patients 
lived. The cancer registry in Yamagata Prefecture is of suf- 
ficient quality; in 2008, rates of death certificate notification 
and death certificate only were 18.5% and 5.9%, respectively. 12 
The data from the registry are included in the IARC (Interna- 
tional Agency for Research on Cancer) Scientific Publications 
entitled "Cancer Incidence in Five Continents". 13 

Preparing datasets: first step 

As the first step, the data set must be prepared in a "csv file". 
Microsoft Excel® (Microsoft; Redmond, WA, USA) is used 
to prepare a table including the following data as columns: 
the names of regions or their identifications, the x and y 
coordinates on a plane rectangular coordinate system, 



x, y coordinates 
on a plane rectangular 
coordinate system 



Population 
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Crude and age- 
adjusted disease 
incidence 



These are used in 
spatial regression 
analysis. 
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Figure I Preparing the dataset. 
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Table I Example data set for analysis using R in the style of "csv file" format. 
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Notes: Data set includes the names of the municipalities in 
rectangular coordinate system, the longitudes and latitudes of 
Abbreviations: ageadj, age-adjusted; Dr, doctor. 



Yamagata: prefecture as names and regions including the x, y coordinates of the municipalities on a plane 
the municipalities, the population, and the incidences of diseases. 



longitude and latitude, the population, incidences of diseases, 
and the explanatory variable. 

The example dataset is shown in Figure 1 and Table 1 . 
It includes the names of the municipalities in Yamagata 
Prefecture as names and regions, the x, y coordinates of the 
municipalities on a plane rectangular coordinate system, the 
longitudes and latitudes of the municipalities, the population, 
and the incidences of diseases. As the explanatory variable, 
the number of doctors in the municipalities was included. 
The age-adjusted disease incidence was used; in Figure 1 , it 
was calculated using the 1985 model population of Japan 10 
and the 2008 model population of Yamagata Prefecture. 11 
This dataset was saved as a "csv file" ("blood.csv" in this 
review). 



Preparing for the analysis: 
second step 

These are the instructions that were used for the analysis: 

• Go to the Excel Save menu 

• Save your worksheet file as a "csv file" ("blood.csv") in 
R work directory (the work directory can be set using the 
preference menu of R) 

• Close Excel 

• Start R by double clicking on the desktop icon 

• R shows the symbol, then expects input commands 

• Select "Packages" from the main menu, select "Install 
package(s)", choose a CRAN (Comprehensive R Archive 
Network; http://cran.r-project.org ) site, and select the 
"spdep" and "DCluster" packages to download and install. 
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Type: 



In this analysis, this shows the crude incidences of 
hematological diseases. Age-adjusted disease 
incidences ("case_ageadj") can be used instead of 
crude incidence ("cases") 
Incidences of specific diseases (such as leukemia, 
lymphoma, etc) can also be used 



>library (spdep) 
>library (DCIuster) 

>blood_OE <-data.frame(Observed=blood$cases) 
>blood_OE<-cbind (blood_OE,Expected=round (blood$population_1 000*sum 

(blood$cases)/s jam (blood$population_1000), digits=1),x=blood$easting, 

y=blood$northing) 
>achisq.stat (blood_OE, lambda=1) 

Output: 

$T 

[1] 62.3842 
$df 
[1]34 
$pvalue 

[1] 0.002118706 < 



This shows that P=0.002 using Pearson's 
chi-square test. This indicates that the 
distribution of the number of cases is not a 
chi-squared distribution 



K 



Preparing data 
frame for analysis 



Type: 

> blood_OE <-cbind (blood_OE, x=blood$easting,y=blood$northing) 

> coords<-as. matrix (blood_OE [,c ("x","y")]) 

> dlist <-dnearneigh (coords, 0, Inf) 

> dlist <-include.self (dlist) 

> dlist.d <-nbdists (dlist, coords) 

> col.W.tango<-nb2listw (dlist, glist=lapply (dlist.d, function(x-){exp(x)}), style="C") 

> a<-tango.test (Observed-offset (log(Expected)), blood_OE, model="poisson", f?=100 ; 
list=col.W.tango, zero.policy=TRUE) 

> tango.test (Observed-offset (log(Expected)), blood_OE, 
model="poisson",R=100,list=col.W.tango,zero.policy=TRUE) 

Preparing 

Output: 

Tango's test of global clustering 

Type of boots: parametric 
Model used when sampling: Poisson 

Number of simulations: 100 

Statistic: 0.00071 50427 ^ Tango's index for spatial clustering 

P-value: 0.0198019: 



coordinate data for 
analysis 



This shows P=0.0198 in Tango's test. Tango's index is one of 
the most widely used spatial statistics for assessing whether 
spatially distributed disease rates are independent or 
clustered. In this analysis the, P value is <0.05; thus, it 
indicates a significant disease cluster in Yamagata Prefecture 



I 



Figure 2 Instructions for Pearson's chi-squared test and Tango's test using R with the "spdep" and "Dcluster" packages. 



Conducting the analysis 
using R: third step 

The instructions for spatial analysis with Pearson's chi- 
squared test and Tango's test using R are shown in Figure 2. 
Tango's test indicates the presence of disease clustering in 
hematological diseases. 

FleXScan is another useful tool for spatial analysis 
detecting disease clustering. The results of global clustering 
tests using Tango's index by FleXScan are shown in Figure 
3. Instructions are available in the users' guide, which can be 
downloaded from the website. 4 A map of Yamagata Prefec- 
ture can be downloaded from the website of freemap ( http:// 
www, freemap .jp ) . 



The impact of medical supply on disease incidence can 
be examined by spatial regression analysis using R with the 
package "spdep". Using spatial data, whether the disease 
incidence as an objective variable has a relationship to 
the explanatory variables can be tested. The instructions 
are shown in Figure 4. The detailed information relating to 
spatial statistics and the method of spatial analysis using R 
have been described elsewhere. 7 814 

Usefulness of spatial statistics 
in hematology and oncology 

In this review, spatial statistical analysis was implemented in 
the field of hematology using the latest techniques. All of the 
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Figure 3 Disease cluster analysis by Tango's index using crude and age-adjusted disease incidences by region of Yamagata Prefecture. 

Notes: Crude (A) and age-adjusted disease incidences using the 1985 model population of Japan (B) and the 2008 population of Yamagata Prefecture (C), by region of 
Yamagata Prefecture. Disease clusters using crude incidences are shown for Tsuruoka, Sakata, Obanazawa, Mogami, Funagata, Mamuragawa, Okura, Mikawa, Shonai, and Uza 
(P=0.048). Disease clusters using age-adjusted disease incidences and the 1985 model population of Japan are shown for Yamagata, Kaminoyama, and Takahata (P=0.00l). 
Disease clusters using the age-adjusted disease incidences and the 2008 population of Yamagata Prefecture are shown for Kaminoyama (P=0.00l). Points and lines indicate 
municipalities and their contiguous areas, respectively. Disease clusters are shown by black dots with red lines. 



tools used are available free of charge. It was demonstrated that 
hematology/oncology physicians can implement such an analy- 
sis in various settings using these tools to compile the data. One 
of the advantages of the technique used is that hypotheses on 
spatial clustering can be tested. This technique enables a spatial 
statistics investigation of disease clustering, whereas in the past, 



such clustering could only be estimated visually by plotting the 
disease incidence. 9 This method is useful in that it enables scien- 
tific validation of the clinical impressions of patient clustering 
that clinicians often glean through daily clinical practice. 

The present analysis showed that, when adjusted for age, 
clustering of hematological malignancies in Yamagata Prefecture 



Longitude and latitude are 
used in this analysis 



iadj), ncol=2) 



Type: 

Library(spdep) 

Coords <- matrix(0, nrow=length(blood$case 
Coords[,1] <- blooet$lohgitude" 
Coords[,2] <- bloodSlatitude 
lph.tri.nb <- tri2nb(coords) 

Iph.lag <- lagsarlm(blood$case_ageadj~num ber_Dr, data=blood, nb2listw(lph.tri.nb, 
Style="W")) 
Summary (Iph.lag) 



Crude incidences ("cases") can also be used. 
Incidences of specific diseases (such as leukemia 
lymphoma, etc.) can also be used 



;mia, 



- number_Dr, data = blood, 



3Q 

2.4144788 



Output: 

Call:lagsarlm(formula = blood$case_ageadj 
listw = nb2listw(lph.tri.nb, style = "W")) 
Residuals: 

Min 1Q Median 

-13.9143064 -2.4598636 0.0087099 
Type: lag 

Coefficients: (asymptotic standard errors) 

Estimate Std. Error z value Pr(>|z|) 
(Intercept) 22.98859 6.75187 3.4048 0.0006622 
number_Dr 0.18274 0.32700 0.5589 0.5762635 
Rho: 0.098211, LRtest value :0.11509, P-value: 0.73442 
Asymptotic standard error :0.26125 

z-value: 0.37593, P-value:0. 70697 
Wald statistic: 0.14132, P-value:0.70697 
Log likelihood: -106.0312 for lag model 
ML residual variance (sigma squared): 25.008, (sigma: 5.0008) 
Number of observations: 35 
Number of parameters estimated: 4 
AIC: 220.06, (AIC for Im: 218.18) 
LM test for residual autocorrelation 
Test value: 0.26683, P-value: 0.60546 



Max 
11.0247369 



Preparing 
adjacency matrix 
for analysis 



In this analysis, whether 
disease incidence as an 
objective variable has 
relationships with 
explanatory variables 
spatially can be tested; the 
number of hematologists is 
not significantly related to 
disease onset in this spatial 
auto-regression analysis 



Figure 4 Instructions for spatial auto-regression analysis using R with the package "spdep". 
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showed significant accumulation of disease in Yamagata City and 
its environs. However, in interpreting this result, consideration 
must be given to the role of health care providers. Specifically, 
care for hematological malignancies is highly specialized, and 
diagnosis is difficult in medically underserved regions, such as 
residential areas that are far from a hospital that has a specialist 
physician, and there is concern that the incidence of disease 
might be underestimated. Even this point can be assessed with 
the technique of spatial analysis presented. Although the present 
data show that the number of hematologists in a municipality 
is not a factor clearly related to incidence, it would be possible 
to assess for each disease a variety of different variables other 
than the number of specialist physicians in the area, such as the 
number of hospitals or the number of outpatient visits to special- 
ist hematological departments for each municipality. 

A method for analyzing the method of spatial clustering of 
hematological malignancies is shown. Although the present 
analysis was performed at the municipality level, it would 
also be possible to use GIS data of even smaller districts, 
and an even more detailed spatial epidemiological analysis 
is also possible. 1516 However, the comprehensive acquisition 
of cancer information is also limited in that it is only possible 
to obtain data in places with a highly precise cancer registry 
such as Yamagata Prefecture. Even this, however, will be 
solved by the expansion of the cancer registration system. 

The etiology of most hematological diseases has not been 
elucidated. Investigation of these epidemiological aspects 
may potentially contribute to a better understanding of the 
etiology of these diseases. In addition, applying the technique 
presented to the investigation of patient prognoses may enable 
generation of data that are also useful for solving health 
policy-related problems, such as the optimal distribution of 
medical resources. 
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